Refine your search
Collections
Journals
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Om, Chol Nam
- Neighborhood Loss for Age Estimation from Face Image Using Convolutional Neural Networks
Abstract Views :79 |
PDF Views:1
Authors
Affiliations
1 Institute of Information Technology, High-Tech Research and Development Centre, Kim Il Sung University, KP
1 Institute of Information Technology, High-Tech Research and Development Centre, Kim Il Sung University, KP
Source
ICTACT Journal on Image and Video Processing, Vol 13, No 1 (2022), Pagination: 2770-2774Abstract
Convolutional Neural Network (CNN) is widely used in estimating age from face image. In many CNN applications such as image classification, face recognition and other computer vision scopes, the cross-entropy loss is used as a supervision signal to train CNN model. However, the cross-entropy loss only enhances the separability of classes and does not consider their correlation in age estimation task. In this paper we propose a novel loss function called neighborhood loss which regards the correlation between classes in age estimation by modifying standard cross entropy loss. To evaluate the effectiveness of the proposed neighborhood loss, we present CNN architecture based on the residual units. Through some experiments, we show that neighborhood loss provides superior performance compared to prior works in age estimation.Keywords
Age Estimation, Neighborhood Loss, Convolutional Neural Network.References
- M. Riesenhuber and T. Poggio, “Hierarchical Models of Object Recognition in Cortex”, Nature neuroscience, Vol. 2, No. 11, pp. 1019-1025, 1999.
- K.H. Liu, T.J. Liu, H.H. Liu and S.C. Pei, “Facial Makeup Detection via Selected Gradient Orientation of Entropy Information”, Proceedings of IEEE International Conference on Image Processing, pp. 4067-4071, 2015.
- T. Ahonen, A. Hadid and M. Pietikainen, “Face Description with Local Binary Patterns: Application to Face Recognition”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 28, No. 12, pp. 2037-2041, 2006.
- K.M. He, X.Y. Zhang, S.Q. Ren and J. Sun, “Identity Mappings in Deep Residual Networks”, Proceedings of European Conference on Computer Vision, pp. 630-645, 2016.
- F. Schroff, D. Kalenichenko and J. Philbin, “FaceNet: A Unified Embedding for Face Recognition and Clustering”, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 815-823, 2015.
- W. Liu, “ImageNet Large Scale Visual Recognition Challenge (ILSVRC) Overview”, Available at https://www.image-net.org/challenges/LSVRC/2016/index.php, Accessed at 2016.
- W. Liu, “ImageNet Large Scale Visual Recognition Challenge (ILSVRC) Overview”, Available at https://image-net.org/challenges/LSVRC/2017/ , Accessed at 2017.
- Microsoft Face API, Available at: http://microsoft.com/cognitive-services/en-us/faceapi, Accessed at 2021.
- Face++, Available at: http://www.faceplusplus.com/demo-detect/, Accessed at 2021.
- Software Development Kit, Available at: http://uxand.com, Accessed at 2021.
- G. Levi and T. Hassner, “Age and Gender Classification using Convolutional Neural Networks”, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-13, 2015.
- H.F Yang, B.Y. Lin, K.Y. Chang and C.S. Chen, “Automatic Age Estimation from Face Images via Deep Ranking”, Proceedings of British Machine Vision Conference, pp. 1-8, 2015.
- R. Rothe, R. Timofte and L.V. Gool, “DEX: Deep Expectation of Apparent Age from a Single Image”, Proceedings of International Conference on Computer Vision Workshop, pp. 252-257, 2015.
- J.K. Deng, J. Guo, Y.X. Zhou, J.K. Yu, I. Kotsia and S. Zafeiriou, “Retinaface: Single-Stage Dense Face Localisation in the Wild”, Proceedings of International Conference on Computer Vision and Pattern Recognition, pp. 1-7, 2019.
- IMDB Face Dataset, Available at: https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/static/imdb_crop.tar, Accessed at 2022.
- Wiki face dataset, https://data.vision.ee.ethz.ch/cvl/rrothe/imdbwiki/static/wiki_crop.tar, Accessed at 2021.
- G. Panis, A. Lanitis, N. Tsapatsoulis and T.F. Cootes, “Overview of Research on Facial Ageing using the FG-NET Ageing Database”, IET Biometrics, Vol. 5, No.2, pp. 37-46, 2016.
- K. Ricanek and T. Tesafaye, “Morph: A Longitudinal Image Database of Normal Adult Age-Progression”, Proceedings of International Conference on Automatic Face and Gesture Recognition, pp. 1-8, 2006.
- B.C. Chen, C.S. Chen and W.H. Hsu, “Face Recognition and Retrieval using Cross-Age Reference Coding with Crossage Celebrity Dataset”, IEEE Transactions on Multimedia, Vol. 17, No. 6, pp. 804-815, 2015.
- K.M. He, X.Y. Zhang, S.Q. Ren and J. Sun, “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification”, Proceedings of IEEE International Conference on Computer Vision, pp. 1026-1034, 2015.
- H. Han, C. Otto, X.M. Liu and A.K. Jain, “Demographic Estimation from Face Images: Human vs Machine Performance”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 37, No. 6, pp. 1148-1161, 2015.
- K. Chen, S.G. Gong, T. Xiang and C.C. Loy, “Cumulative Attribute Space for Age and Crowd Density Estimation”, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2467-2474, 2013.
- G.D. Guo, Y. Fu, C.R. Dyer and T.S. Huang, “Image-Based Human Age Estimation by Manifold Learning and Locally Adjusted Robust Regression”, IEEE Transactions on Image Processing, Vol. 17, No. 7, pp. 1178-1188, 2008.
- K.Y. Chang, C.S. Chen and Y.P. Hung, “Ordinal Hyperplanes Ranker with Cost Sensitivities for Age Estimation”, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 585-592, 2011.
- X.L. Wang, R. Guo and C. Kambhamettu, “Deeply-Learned Feature for Age Estimation”, Proceedings of IEEE Winter Conference on Applications of Computer Vision, pp. 534-541, 2015.
- Multichannel Speech Enhancement of Target Speaker Based on Wakeup Word Mask Estimation with Deep Neural Network
Abstract Views :92 |
PDF Views:0
Authors
Affiliations
1 Institute of Information Technology, Hightech Research & Development Center Kim Il Sung University, Pyongyang, KP
1 Institute of Information Technology, Hightech Research & Development Center Kim Il Sung University, Pyongyang, KP
Source
International Journal of Advanced Networking and Applications, Vol 15, No 1 (2023), Pagination: 5754-5759Abstract
In this paper, we address a multichannel speech enhancement method based on wakeup word mask estimation using Deep Neural Network (DNN). It is thought that the wakeup word is an important clue for target speaker. We use a DNN to estimate the wakeup word mask and noise mask and apply them to separate the mixed wakeup word signal into target speaker’s speech and background noise. Convolutional Recurrent Neural Network (CRNN) is used to exploit both short and long term time-frequency dependencies of sequences such as speech signals. Generalized Eigen Vector (GEV) beamforming estimates the spatial filter by using the masks to enhance the following speech command of target speaker and reduce undesirable noise. Experiment results show that the proposal provides more robust to noise, so that improves the Signal-to-Noise Ratio (SNR) and speech recognition accuracy.Keywords
Multichannel Speech Enhancement, Wakeup Word, Mask Estimation, Beamforming, Deep Neural Network (DNN).References
- B.Y. Xia, and C.C. Bao, Speech enhancement with weighted denoising auto-encoder, Proc. 14th Annual Conf. of the International Speech Communication Association, Lyon, France, 2013, 3411–3415.
- J. Heymann, L. Drude, A. Chinaev, and R. Haeb-Umbach, BLSTM supported GEV beamformer front-end for the 3rd CHIME challenge, Proc. IEEE Workshop on Automatic Speech Recognition and Understanding, Scottsdale, AR, 2015, 444-451.
- B.D. Van Veen, and K.M. Buckly, Beamforming: a versatile approach to spatial filtering, IEEE Acoustic, Speech and Signal Processing Magazine, 5(2), 1988, 4-24.
- S. Doclo, W. Kellermann, S. Makino, and S. Nordholm, Multichannel signal enhancement algorithms for assisted listening devices, IEEE Signal Processing Magazine, 32(2), 2015, 18-30.
- T. Hori, Z. Chen, H. Erdogan, J.R. Hershey, J. Le Roux, V. Mitra, and S. Watanabe, Multi-microphone speech recognition integrating beamforming, robust feature extraction, and advanced DNN/RNN backend, Computer Speech and Language, 46, 2017, 401-418.
- Y. Kida, D. Tran, M. Omachi, T. Taniguchi, and Y. Fujita, Speaker selective beamformer with keyword mask estimation, Proc. 2018 IEEE Workshop on Spoken Language Technology, Athens, Greece, 2018, 528-534.
- E. Warsitz, and R. Haeb-Umbach, Blind acoustic beamforming based on generalized eigenvalue decomposition, IEEE Transactions on Audio Speech & Language Processing, 15(5), 2007, 1529-1539.
- J. Heymann, L. Drude, and R. Haeb-Umbach, Neural network based spectral mask estimation for acoustic beamforming, Proc. 41st IEEE International Conf. on Acoustics, Speech and Signal Processing, Shanghai, PRC, 2016, 196–200.
- J. Heymann, L. Drude, and R. Haeb-Umbach, A generic neural acoustic beamforming architecture for robust multi-channel speech processing, Computer Speech & Language, 46, 2017, 374-385.
- L. Yin, H. Ying, L.D. Kun, L. Rui, and Y.M. Hao, Chinese sign language recognition based on two-stream CNN and LSTM network, International Journal of Advanced Networking and Applications, 14(6), 2023, 5666-5671.
- P. Elechi, E. Okowa, and O.P. Illuma, Analysis of a SONAR detecting system using multi-beamforming algorithm, International Journal of Advanced Networking and Applications, 14(5), 2023, 5596-5601.
- D. Amodei, S. Ananthanarayan, R. Anubhai, J.L. Bai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, and Q. Cheng, Deep speech 2: End-to-end speech recognition in English and Mandarin, Proc. 33rd International Conf. on Machine Learning, New York, NY, 2016.
- Y.B. Zhou, C.M. Xiong, and R. Socher, Regularization techniques for end-to-end speech recognition, Patent, San Francisco, CA, US, US20190130896A1, 2019.
- F.Y. Hou, L. Xie, and Z.H. Fu, Investigating neural network based query-by-example keyword spotting approach for personalized wake-up word detection in Mandarin Chinese, Proc. 10th International Symposium on Chinese Spoken Language Processing, Tianjin, PRC, 2017.
- G.G.Chen, C. Parada, and G. Heigold, Small-footprint keyword spotting using deep neural networks, Proc. 2014 IEEE International Conf. on Acoustics, Speech and Signal Processing, Florence, Italy, 2014.
- Y.D. Zhang, N. Suda, L.Z. Lai, and V. Chandra, Hello Edge: Keyword spotting on microcontrollers, arXiv: 1711.07128, 2017.
- T.N. Sainath, and C. Parada, Convolutional neural networks for small-footprint keyword spotting, Proc. 16th Annual Conf. of the International Speech Communication Association, Dresden, Germany, 2015.
- A. Krueger, E. Warsitz, and R. Haeb-Umbach, Speech enhancement with a GSC-like structure employing eigenvector-based transfer function ratios estimation, IEEE Transactions on Audio, Speech and Language Processing, 19(1), 2011, 206–219.
- H. Lucy, The MagPi (Raspberry Pi Trading Ltd, 30 Station Road, Cambridge, 2018).
- A Gated Recurrent Unit Based Robust Voice Activity Detector
Abstract Views :57 |
PDF Views:3
Authors
Affiliations
1 Institute of Information Technology, Hightech Research & Development Center Kim Il Sung University, Pyongyang, KP
1 Institute of Information Technology, Hightech Research & Development Center Kim Il Sung University, Pyongyang, KP
Source
International Journal of Advanced Networking and Applications, Vol 15, No 2 (2023), Pagination: 5831-5836Abstract
Voice activity detection (VAD), which identifies speech and non-speech durations in speech signals, is a challenging task under noisy environment for various speech applications. In this paper, we propose a Gated Recurrent Unit (GRU) based VAD using MFCCs augmented delta and delta-delta features under the low signal-to-noise ratios (SNRs) environments to overcome the shortages of the traditional VAD models. We compare the proposed method with the traditional methods by using speech signals smeared with 10 types of noise at low SNRs. Experimental results reveal that the proposed method based on GRU is superior to traditional method under all the considered noisy environments, indicating that the network based on GRU improve the performance of speech detection.Keywords
voice activity detection, deep neural network, recurrent neural network, gated recurrent unit.References
- S.F. Boll, Suppression of Acoustic Noise in Speech Using Spectral Subtraction, IEEE Transactions on Acoustics, Speech and Signal Processing, 27(2), 1979, 113-120.
- A. Benyassine, E. Shlomot, H.Y. Su, D. Massaloux, C. Lamblin, and J.P. Petit, ITU-T Recommendation G.729 Annex B: a Silence Compression Scheme for Use with G.729 Optimized for V.70 Digital Simultaneous Voice and Data Applications, IEEE Communications Magazine, 35(9), 1997,64-73.
- S.B. Tong, N.X. Chen, Y.M. Qian, and K. Yu, Evaluating Vad for Automatic Speech Recognition, Proc. 12th International Conf. on Signal Processing, Hangzhou, PRC, 2014,2308–2314.
- L. Rabiner, and M.R. Sambur, An Algorithm for Determining the Endpoints of Isolated Utterances, Bell System Technical Journal, 54(2), 1975,297-315.
- J. Ramirez, J.C. Segura, C. Benitez, A.de la Torre, and A. Rubio, Efficient Voice Activity Detection Algorithms Using Long-Term Speech Information, Speech Communication, 42(3-4), 2004,271-287.
- X.K. Yang, L.He, D. Qu, and W.Q. Zhang, Voice Activity Detection Algorithm Based on Long-Term Pitch Information, EURASIP Journal on Audio, Speech and Music Processing, 2016:14, 2016,1-9.
- Y.N. Ma, and A. Nishihara, Efficient Voice Activity Detection Algorithm Using Long-Term Spectral Flatness Measure, EURASIP Journal on Audio, Speech and Music Processing, 2013:21, 2013,1-18.
- K. Ishizuka, T. Nakatani, M. Fujimoto, and N. Miyazak, Noise Robust Voice Activity Detection Based on Periodic to Aperiodic Component Ratio, Speech Communication, 52(1), 2010,41-60.
- J.S. Sohn, N.S. Kim, and W.Y. Sung, A Statistical Model-Based Voice Activity Detection, IEEE Signal Processing Letters, 6(1), 1999,1-3.
- E.Q. Dong, G.Z. Liu, Y.T. Zhou, and X.D. Zhang, Applying Support Vector Machines to Voice Activity Detection, Proc. 6th International Conf. on Signal Processing, Beijing, PRC, 2002,1124–1127.
- T. Kinnunen, E. Chernenko, M. Tuononen, P. Fränti, and H.Z. Li,Voice Activity Detection Using MFCC Features and Support Vector Machine,Proc. International Conf. on Speech and Computer, 2007,556–561.
- Q.H. Jo, J.H. Chang, J.W. Shin, and N. S. Kim, Statistical Model-Based Voice Activity Detection Using Support Vector Machine, IET Signal Processing, 3(3), 2009,205-210.
- G. Ferroni, R. Bonfigli, E. Principi, S. Squartini, and F. Piazza, A Deep Neural Network Approach for Voice Activity Detection in Multi-Room Domestic Scenarios, Proc. International Joint Conf. on Neural Networks, Killarney, IRELAND, 2015,1–8.
- X.L. Zhang, and D.L. Wang, Boosting Contextual Information for Deep Neural Network Based Voice Activity Detection, IEEE/ACM Transactions on Audio, Speech and Language Processing, 24(2), 2016,252-264.
- M. Espi, M. Fujimoto, K. Kinoshita, and T. Nakatani, Exploiting Spectro-Temporal Locality in Deep Learning Based Acoustic Event Detection, EURASIP Journal on Audio Speech and Music Processing, 2015:26, 2015,1-12.
- S.M. Valentin, N.P. Tatiana, and A.P. Alexey, Robust Voice Activity Detection with Deep Maxout Neural Networks, Modern Applied Science, 9(8), 2015,153-159.
- X.L. Zhang, and J.Wu, Deep Belief Networks Based Voice Activity Detection, IEEE Transactions on Audio, Speech and Language Processing, 21(4), 2013,697-710.
- S.Y. Chang, B.Li, G. Simko, T.N. Sainath, A. Tripathi, A. van den Oord, and O. Vinyals, Temporal Modeling using Dilated Convolution and Gating for Voice-Activity-Detection,Proc. IEEE International Conf. on Acoustics, Speech and Signal Processing, Calgary, CANADA, 2018,5549–5553.
- A. Sehgal, and N. Kehtarnavaz, A Convolutional Neural Network Smartphone App for Real-Time Voice Activity Detection, IEEE Access, 21, 2018,9017-9026.
- M. Lavechin, M.P. Gill, R. Bousbib, H. Bredin, and L.P. Garcia-Perera, End-to-End Domain-Adversarial Voice Activity Detection,Proc. Conference of the International Speech Communication Association, Shanghai, PRC, 2020,3685–3689.
- T.J. Xu, H. Zhang, and X.L. Zhang, Polishing the Classical Likelihood Ratio Test by Supervised Learning for Voice Activity Detection,Proc. Conference of the International Speech Communication Association, Shanghai, PRC, 2020,3675–3679.
- Z.P. Zheng, J.Z. Wang, N. Cheng, J. Luo, and J. Xiao, MLNET: an Adaptive Multiple Receptive-Field Attention Neural Network for Voice Activity Detection,Proc. Conference of the International Speech Communication Association, Shanghai, PRC, 2020,3695–3699.
- T. Mikolov, M. Karafiat, L. Burget, J. Cernocky, and S. Khudanpur, Recurrent Neural Network Based Language Model,Proc. Conference of the International Speech Communication Association, Makuhari, JAPAN, 2010,1045–1048.
- S. Dwijayanti, K. Yamamori, and M. Miyoshi, Enhancement of Speech Dynamics for Voice Activity Detection using DNN, EURASIP Journal on Audio, Speech and Music Processing, 2018:10, 2018,1-15.
- K.H. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, Learning Phrase Representations using RNN Encoder-decoder for Statistical Machine Translation, arXiv preprint, arXiv:1406.1078, 2014.
- S. Hochreiter, and J. Schmidhuber, Long Short-Term Memory, Neural computation, 9(8), 1997,1735-1780.
- F.A. Gers, N.N. Schraudolph, and J. Schmidhuber, Learning Precise Timing with LSTM Recurrent Networks, Journal of Machine Learning Research, 3(1), 2003,115-143.
- J.Y. Chung, C. Gulcehre, K.H. Cho, and Y. Bengio, Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling, arXiv preprint, arXiv:1412.3555, 2014.
- M. Schuster, and K. K Paliwal, Bidirectional Recurrent Neural Networks, IEEE Transactions on Signal Processing, 45(11), 1997,2673-2681.
- Noisex-92 Database, Rice University, Available at: http://spib.linse.ufsc.br/noise.html. Accessed on 22 Feb 2017.
- J.L. Ba, J.R. Kiros, and G.E. Hinton, Layer Normalization, arXiv preprint, arXiv:1607.06450, 2016.
- J.S. Garofolo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S. Pallett, and N.L. Dahlgren, Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM, NIST Interagency/Internal Report, NISTIR-4930, NIST, Gaithersburg, 1993.
- 100 Nonspeech Environmental Sounds, Available at: http://www.pudn.com/Download/item/id/3457634.html ,2018.
- D. Kingma, and J. Ba, Adam: a Method for Stochastic Optimization, arXiv preprint, arXiv:1412.6980, 2014.
- R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, 2nd edn, Wiley-Interscience, New York, 2001.